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Amendments to the Claims: 

This listing of claims will replace all prior versions, and listings of claims in the 
application. Applicant has submitted a new complete claim set showing any marked up claims 
with insertions indicated by underlining and deletions indicated by strikeouts and/or double 
bracketing. 

Listing of Claims: 

1. (Currently amended) A process for testing an evaluation data record having attribute fields 
containing data comprising: 

providing a reference table having a number of reference records against which a 
evaluation data record is tested; 

identifying reference table tokens contained within the reference records of the reference 
table and determining a count of tokens in the reference table classified according to attribute 
field; and 

assigning a similarity score to said evaluation data record in relation to a reference record 
within the reference table based on a combination of: 

the number of common tokens of an evaluation field of the inpu tevaluation data 
record and a corresponding field within a reference record from the reference table; 

the similarity of the tokens that are not the same in the evaluation field of the 
input evaluation data record and the corresponding field of the reference record from the 
reference table; and 

a weight of the tokens of the evaluation data record that is based on a count of the 
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tokens from a corresponding field contained within the reference table. 

2. (Original) The process of claim 1 wherein a look-up table based of contents of reference 
records in the reference table is prepared before evaluation of the evaluation data record and 
wherein the tokens of the evaluation data record are evaluated by comparing the contents of the 
look-up table with contents of the tokens of said evaluation data record to prepare a candidate set 
of reference records for which a similarity score is assigned. 

3. (Original) The process of claim 2 additionally comprising a step of evaluating tokens in the 
reference table by: 

breaking tokens in the reference table up into sets of substrings having a length q; 

applying a function to the set of substrings for a token to provide a vector representative 
of a token; and 

building a lookup table for substrings found within the tokens that make up the reference 

table. 

4. (Original) The process of claim 3 wherein the process of building the lookup table creates an 
entry for each substring comprising: an attribute field for said substring, a co-ordinate within a 
vector for said substring, a frequency of said substring, and a list of reference records where said 
substring appears in the specified attribute field and vector co-ordinate position. 

5. (Original) The process of claim 4 wherein the weights that are assigned to tokens of the 
evaluation record are distributed across candidate records from the reference table during a 
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determination of a candidate set of records. 

6. (Original) The process of claim 4 wherein a candidate record table is built and records listed in 
the lookup table are added to the candidate record table based on vector representations of the 
tokens of the input record. 

7. (Original) The process of claim 6 wherein a candidate record is added to the candidate record 
table only if a score assigned to the reference record can exceed a threshold based on an already 
evaluated substring. 

8. (Currently amended) A process for testing an evaluation data record having attribute fields 
containing data comprising: 

providing a reference table having a number of reference records against which a 
evaluation data record is tested; 

identifying reference table tokens contained within the reference records of the reference 
table and determining a count of tokens in the reference table classified according to attribute 
field; 

assigning a similarity score to said evaluation data record in relation to a reference record 
within the reference table based on a combination of: 

the number of common tokens of an evaluation field of the input data record and a 
corresponding field within a reference record from the reference table; 

the similarity of the tokens that are not the same in the evaluation field of the 
input data record and the corresponding field of the reference record from the reference 

Type of Response: Amendment 
Application Number: 10/600,083 
Attorney Docket Number: 301555.01 
Filing Date: June 20, 2003 

4/26 



PATENT 



table; and 

a weight of the tokens of the evaluation data record that is based on a count of the 
tokens from a corresponding field contained within the reference table; 

wherein a look-up table based on contents of reference records in the reference table is 
prepared before evaluation of the evaluation data record and wherein the tokens of the evaluation 
data record are evaluated by comparing the contents of the look-up table with contents of the 
tokens of said evaluation data record to prepare a candidate set of reference records for which a 
similarity score is assigned; 

evaluating tokens in the reference table by: 

breaking tokens in the reference table up into sets of substrings having a length q; 

applying a function to the set of substrings for a token to provide a vector 
representative of a token; and 

building a lookup table for substrings found within the tokens that make up the 
reference table, wherein the process of building the lookup table creates an entry for each 
substring comprising: an attribute field for said substring, a co-ordinate within a vector 
for said substring, a frequency of said substring, and a list of reference records where said 
substring appears in the specified attribute field and vector co-ordinate position; 

wherein a candidate record table is built and records listed in the lookup table are added 
to the candidate record table based on vector representations of the tokens of the input record; 
and 

The process of claim 6 wherein once a likely reference record that matches the evaluation 
data record with a specified degree of certainty is found, further searching for records in the 
reference table is stopped. 
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9. (Original) The process of claim 1 wherein a closest K reference records from the reference 
table are identified as possible matches with the input record. 

10. (Original) The process of claim 1 wherein reference records having a similarity score greater 
than a threshold are identified as candidate records. 

11. (Original) The process of claim 2 additionally comprising a step of evaluating tokens in the 
reference table by applying a function to the set of substrings for a token to provide a vector 
representative of a token; and further comprising preparing the look-up table for tokens that 
make up the reference table by creating an entry in the look-up table for a token including an 
attribute field for the token or a substring, an attribute field for a co-ordinate within a vector for 
said token or substring, an attribute field for a frequency of said token or substring, and a list of 
reference records where said token or said substring appears in the specified field and vector co- 
ordinate position. 

12. (Currently amended) A process for testing an evaluation data record having attribute fields 
containing data comprising: 

providing a reference table having a number of reference records against which a 
evaluation data record is tested; 

identifying reference table tokens contained within the reference records of the reference 
table and determining a count of tokens in the reference table classified according to attribute 
field; 

assigning a similarity score to said evaluation data record in relation to a reference record 
within the reference table based on a combination of: 
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the number of common tokens of an evaluation field of the input data record and a 
corresponding field within a reference record from the reference table; 

the similarity of the tokens that are not the same in the evaluation field of the 
input data record and the corresponding field of the reference record from the reference 
table; and 

a weight of the tokens of the evaluation data record that is based on a count of the 
tokens from a corresponding field contained within the reference table; and 

The process of claim 1 additionally comprising the step of maintaining a token frequency 
cache in a high speed access memory for use in assigning weights to said tokens. 

13. (Original) The process of claim 1 wherein the tokens in different attribute fields are assigned 
different weights in determining said score. 

14. (Original) The process of claim 1 wherein assigning a score includes determining a cost in 
transposing the order of two tokens in determining a similarity between tokens of the input data 
record and records in the reference table. 

15. (Original) The process of claim 14 wherein the determining of a cost in transposing tokens 
takes into account a weight of said tokens that are transposed. 

16. (Currently amended) A system for evaluating an input data record having fields containing 
data comprising: 

a database for storing a reference table having a number of records against which an input 
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data record is evaluated; 

a preprocessor component for evaluating records in the reference table to identify tokens 
and determining a count of tokens in the reference table classified according to record field; and 

a matching component for assigning a score to an input data record in relation to a 
reference record within the reference table based on a combination of: 

i) the number of common tokens of an evaluation field of the input data record 
and a corresponding field within a reference record from the reference table; 

ii) the similarity of the tokens that are not the same in the evaluation field of the 
input data record and the corresponding field of the reference record from the reference 
table; and 

iii) a weight of the tokens of the evaluation input data record that is based on a 
count of the tokens from the corresponding field contained within the reference table. 

17. (Original) The system of claim 16 wherein the preprocessor component evaluates tokens in 
the reference table by: 

breaking tokens in the reference table up into sets of substrings having a length q; 

applying a hash function to the set of substrings for a token to provide a vector 
representative of a token; and 

building a lookup table for substrings found within the tokens that make up the reference 

table. 



18. (Original) The system of claim 17 wherein the preprocessor creates an entry in the lookup 
table for each substring, an attribute field for said substring, a co-ordinate within a vector for said 
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substring, and a list of reference records where said substring appears in the specified attribute 
field and vector co-ordinate position. 

19. (Currently amended) A process for evaluating an input data record having attribute fields 
containing data comprising: 

providing a number of reference records organized into attribute fields against which an 
input data record is evaluated; 

evaluating reference records to identify tokens from said attribute fields and then 
evaluating each token to build a vector of token substrings that represent the token; 

building an index table wherein entries of the index table contains a token substring and a 
list of reference records that contain a token that maps to the token substring; and 

looking up reference records in the index table based on the contents of the input record 
and selecting a number of candidate records from the reference records in the index table for 
comparing to said input data recor d; and 

assigning a similarity score to said input data record in relation to a candidate set of 
reference records based on a combination of: 

the number of common tokens of an evaluation field of the input data record and a 
corresponding field within a reference record; 

the similarity of the tokens that are not the same in the evaluation field of the 
input data record and the corresponding field of the reference record; and 

a weight of the tokens in the evaluation field of the input data record based on a 
count of the tokens from the corresponding field contained within the reference records . 

Type of Response: Amendment 
Application Number: 10/600,083 
Attorney Docket Number: 301555.01 
Filing Date: June 20, 2003 

9/26 



PATENT 



20. Canceled. 

21. (Original) The process of claim 19 wherein a candidate record table is built and candidate 
records from the index table are added to a candidate record table based on an H dimensional 
vector of token substrings determined from tokens contained in the input record. 

22. (Original) The process of claim 21 wherein tokens are parsed from the input data record and 
tokens contained in said input data record are assigned token weights based on occurrences of 
the tokens in the reference table and further wherein records added to the candidate record table 
are factored by an amount corresponding to the weights of tokens extracted from the input data 
record. 

23. (Original) The process of claim 22 wherein weights are assigned to tokens based on the 
attribute field in which the tokens are contained in the reference table. 

24. (Original) The process of claim 19 additionally comprising a step of assigning a similarity 
score to said input data record in relation to a candidate set of reference records based on : 

a cost in converting tokens in the input data record to tokens in a corresponding field of a 
reference record wherein the cost is based on a weight of the tokens in the corresponding field of 
said reference record corresponding to a count of the tokens from the corresponding field 
contained within the reference records. 

25. (Original) The process of claim 19 wherein the reference records are stored in a reference 
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table and wherein a candidate record table is built and candidate records from the index table are 
added to a candidate record table based on token substrings contained in the input record and 
wherein tokens contained in said input data record are assigned token weights based on 
occurrences of the tokens in the reference table and further wherein records added to the 
candidate record table are factored by an amount corresponding to the weights of tokens 
contained in the input data record. 

26. (Original) The process of claim 21 wherein a candidate record is added to the candidate 
record table only if a possible score assigned to the reference record in the reference table can 
exceed a threshold based on an already evaluated substring. 

27. (Currently amended) A process for evaluating an input data record having attribute fields 
containing data comprising: 

providing a number of reference records organized into attribute fields against which an 
input data record is evaluated; 

evaluating reference records to identify tokens from said attribute fields and then 
evaluating each token to build a vector of token substrings that represent the token; 

building an index table wherein entries of the index table contains a token substring and a 
list of reference records that contain a token that maps to the token substring; 

looking up reference records in the index table based on the contents of the input record 
and selecting a number of candidate records from the reference records in the index table for 
comparing to said input data record; 

wherein a candidate record table is built and candidate records from the index table are 
added to a candidate record table based on an H dimensional vector of token substrings 
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determined from tokens contained in the input record; 

wherein a candidate record is added to the candidate record table only if a possible score 
assigned to the reference record in the reference table can exceed a threshold based on an already 
evaluated substring; and 

The process of claim 26 wherein once a likely reference record that matches the 
evaluation data record with a specified degree of certainty is found further searching for 
reference records in the reference table is stopped. 

28. (Currently amended) The process of claim 5019 wherein a closest K reference records from 
the reference table are identified as possible matches with the input record. 

29. (Currently amended) The process of claim 3019 wherein reference records having a 
similarity score greater than a threshold are identified as candidate records. 

30. (Currently amended) A process for evaluating an input data record having attribute fields 
containing data comprising: 

providing a number of reference records organized into attribute fields against which an 
input data record is evaluated; 

evaluating reference records to identify tokens from said attribute fields and then 
evaluating each token to build a vector of token substrings that represent the token; 

building an index table wherein entries of the index table contains a token substring and a 
list of reference records that contain a token that maps to the token substring; 

looking up reference records in the index table based on the contents of the input record 
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and selecting a number of candidate records from the reference records in the index table for 
comparing to said input data record; 

assigning a similarity score to said input data record in relation to a candidate set of 
reference records based on a combination of: 

the number of common tokens of an evaluation field of the input data record and a 
corresponding field within a reference record; 

the similarity of the tokens that are not the same in the evaluation field of the 
input data record and the corresponding field of the reference record; and 

a weight of the tokens in the evaluation field of the input data record based on a 
count of the tokens from the corresponding field contained within the reference records; 
and 

Tho process of claim 20 additionally comprising tho stop of maintaining a token 
frequency cache in a high speed access memory for use in assigning weights to said tokens. 

31. (Currently amended) The process of claim 3019 wherein the tokens in different attribute 
fields are assigned different weights in determining said score. 

32. (Original) The process of claim 19 wherein the index table additionally comprises an 
attribute field for a token from which a substring is derived. 

33. (Original) The process of claim 19 wherein the vector is an H dimensional vector of token 
substrings and the index table entries also contain an attribute field, a position within the H 
dimensional vector and a frequency of reference records that map to the token substring 
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contained in an index table entry. 

34. (Original) A system for evaluating an input data record having fields containing data 
comprising: 

a database for storing a reference table having a number of reference records against 
which an input data record is evaluated; 

a preprocessor component for evaluating reference records in the reference table to 
identify tokens and determining a count of tokens in the reference table classified according to 
record field; said preprocessor evaluating reference records to identify tokens from said attribute 
fields and then evaluating each token to build a H dimensional vector of token substrings that 
represent the token; and building an index table wherein entries of the index table contains a 
token substring , an attribute field, a position within the H dimensional vector, and a list of 
reference records; and 

a matching component for assigning a score to an input data record in relation to a 
reference record within the reference table by building a candidate record table of candidate 
records from the index table based on an H dimensional vector of token substrings determined 
from tokens contained in the input record and assigning a score to said candidate records based 
on a weight of the tokens of the input data record that is based on a count of the tokens from the 
corresponding field contained within the reference table. 

35. (Currently amended) A data structure encoded on a computer readable medium for use in 
evaluating an input data record having fields containing data comprising: 

a reference table organized in attribute columns having a number of records against 
which an input data record is evaluated; and 
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an index table wherein each entry of the index table contains a token substring from a 
token in the reference table, a column of the reference table having said token from which the 
token substring is derived, a position within a H dimensional vector based on said token, and a 
list of records contained within the reference table A 

wherein a similarity score is assigned to the input data record in relation to a record 
within the reference table based on a weight of tokens of the input data record that is based on a 
count of the tokens from a corresponding field contained within the reference table . 

36. (Original) The data structure of claim 35 wherein each entry of the index table additionally 
comprises an attribute field for the token from which a substring is derived. 

37. (Currently amended) A machine readable medium including instructions for evaluating an 
input data record having attribute fields containing data b y steps of: 

accessing a reference table having a number of records organized into attribute fields 
against which an input data record is evaluated; 

evaluating records in the reference table to identify tokens from said attribute fields and 
then evaluating each token with a function to build a vector of token substrings that serve as a 
signature of the token; 

building an index table wherein each entry of the index table contains a token substring , 
a column of the reference table, a position within the vector, and a list of records contained 
within the reference table; and 

looking up records in the index table based on the contents of the input record ; and 

assigning a similarity score to said input data record in relation to a candidate set of 
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reference records within the reference table based on a combination of: 

the number of common tokens of an evaluation field of the input data record and a 
corresponding field within a reference record from the reference table; 

the similarity of the tokens that are not the same in the evaluation field of the 
input data record and the corresponding field of the reference record from the reference 
table; and 

a weight of the tokens in the evaluation field of the input data record that is based 
on a count of the tokens from the corresponding field contained within the reference 
table . 

38. Canceled. 

39. (Original) The machine readable medium of claim 37 wherein a candidate record table is 
built and records from the index table are added to a candidate record table based on vector 
substring representations of the tokens of the input record. 



40. (Original) The machine readable medium of claim 39 wherein a candidate record is added to 
the candidate record table only if a score assigned to the reference record can exceed a threshold 
based on an already evaluated substring representation of the input record. 



41. (Currently amended) A machine readable medium including instructions for evaluating an 
input data record having attribute fields containing data by steps of: 

accessing a reference table having a number of records organized into attribute fields 
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against which an input data record is evaluated; 

evaluating records in the reference table to identify tokens from said attribute fields and 
then evaluating each token with a function to build a vector of token substrings that serve as a 
signature of the token; 

building an index table wherein each entry of the index table contains a token substring, a 
column of the reference table, a position within the vector, and a list of records contained within 
the reference table; 

looking up records in the index table based on the contents of the input record; 

wherein a candidate record table is built and records from the index table are added to a 
candidate record table based on vector substring representations of the tokens of the input record; 
and 

Tho machine readable medium of claim 39 wherein once a likely reference record that 
matches the evaluation data record with a specified degree of certainty is found further searching 
for records in the reference table is stopped. 

42. (Currently amended) The machine readable medium of claim 3*37 wherein a closest K 
reference records from the reference table are identified as possible matches with the input 
record. 

43. (Currently amended) The machine readable medium of claim 3*37 wherein reference records 
having a similarity score greater than a threshold are identified as candidate records. 

44. (Currently amended) A machine readable medium including instructions for evaluating an 
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input data record having attribute fields containing data by steps of: 

accessing a reference table having a number of records organized into attribute fields 
against which an input data record is evaluated; 

evaluating records in the reference table to identify tokens from said attribute fields and 
then evaluating each token with a function to build a vector of token substrings that serve as a 
signature of the token; 

building an index table wherein each entry of the index table contains a token substring , 
a column of the reference table, a position within the vector, and a list of records contained 
within the reference table; 

looking up records in the index table based on the contents of the input record; 

assigning a similarity score to said input data record in relation to a candidate set of 
reference records within the reference tabic based on a combination of: 

the number of common tokens of an evaluation field of the input data record and a 
corresponding field within a reference record from the reference table; 

the similarity of the tokens that are not the same in the evaluation field of the 
input data record and the corresponding field of the reference record from the reference 
table; and 

a weight of the tokens in the evaluation field of the input data record that is based 
on a count of the tokens from the corresponding field contained within the reference 
table; and 

The machine readable medium of claim 38 additionally comprising the stop of 
maintaining a token frequency cache in a high speed access memory for use in assigning weights 
to said tokens. 
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45. (Currently amended) The machine readable medium of claim 3&37 wherein the tokens in 
different attribute fields are assigned different weights in determining said score. 



46. (Original) The machine readable medium of claim 37 wherein the index table additionally 
comprises an attribute field for a token from which a substring is derived. 
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